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Abstract. We use the mathematical language of sheaf theory to give a unified treatment 
of non-locality and contextuality, in a setting which generalizes the familiar probability tables 
used in non-locality theory to arbitrary measurement covers; this includes Kochen-Specker 
configurations and more. We show that contextuality, and non-locality as a special case, 
correspond exactly to obstructions to the existence of global sections. We describe a linear 
algebraic approach to computing these obstructions, which allows a systematic treatment of 
arguments for non-locality and contextuality. We distinguish a proper hierarchy of strengths of 
no-go theorems, and show that three leading examples — due to Bell, Hardy, and Greenberger, 
Horne and Zeilinger, respectively — occupy successively higher levels of this hierarchy. A 
general correspondence is shown between the existence of local hidden-variable realizations 
using negative probabilities, and no-signalling; this is based on a result showing that the 
linear subspaces generated by the non-contextual and no-signalling models, over an arbitrary 
measurement cover, coincide. Maximal non-locality is generalized to maximal contextuality, and 
characterized in purely qualitative terms, as the non-existence of global sections in the support. 
A general setting is developed for Kochen-Specker type results, as generic, model-independent 
proofs of maximal contextuality, and a new combinatorial condition is given, which generalizes 
the 'parity proofs' commonly found in the literature. We also show how our abstract setting can 
be represented in quantum mechanics. This leads to a strengthening of the usual no-signalling 
theorem, which shows that quantum mechanics obeys no-signalling for arbitrary families of 
commuting observables, not just those represented on different factors of a tensor product. 
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1. Introduction 

Non-locality and contextuality are fundamental features of physical theories, which contradict the 
intuitions underlying classical physics. They are, in particular, prominent features of quantum 
mechanics, and the goal of the classic no-go theorems by Bell [T], Kochen-Specker P], et al. is to 
show that they are necessary features of any theory whose experimental predictions agree with 
those of quantum mechanics. 

Bell's insights into non-locality have been seminal to the current developments in quantum 
information, where entanglement is viewed as a key informatic resource; and there has also been 
considerable recent work on experimental tests for contextuality O |4] . 

In the present paper, we study these notions from a novel perspective, which yields new 
insights and results. Our approach has the following notable features: 

• The importance of Bell's theorem and related results is that they apply, not just to quantum 
mechanics, but to all theories with certain structural properties. We introduce a general 
mathematical setting, completely independent of Hilbert space, which strengthens this 
feature, and allows results to be proved in considerable generality. 

• We study non-locality and contextuality in a unified setting. The idea that non-locality can 
be seen as a particular form of contextuality, and specific results obtaining Bell-type non- 
locality from Kochen-Specker configurations, can be found in references such as [3 El [7]. An 
important recent contribution in this direction is , which studies non-contextual inequalities 
as a generalization of Bell inequalities. 

Our approach focusses on structural aspects. It offers a general, systematic and 
mathematically robust setting in which non-locality and contextuality are treated in a unified 
fashion; our definitions and results specialize to yield standard formulations of either as special 
cases, but subsume both. 

• We use the mathematics of sheaf theory to analyze the structure of non-locality and 
contextuality. Sheaf theory is pervasive in modern mathematics, allowing the passage from 
local to global [9]. Starting from a simple experimental scenario, and the kind of probabilistic 
models familiar from discussions of Bell's theorem, Popescu-Rohrlich boxes [10], etc., we show 
that there is a very direct, compelling formalization of these notions in sheaf-theoretic terms. 
Moreover, on the basis of this formulation, we show that the phenomena of non-locality and 
contextuality can be characterized precisely in terms of obstructions to the existence of 
global sections. We give linear algebraic methods for computing these obstructions. 

These ideas lead in turn to a number of novel insights into non-locality and contextuality: 

• We are able to distinguish three strengths of degree of non-locality: standard probabilistic 
non-locality, exhibited by the original example of Bell; possibilistic non-locality, 
exemplified by the well-known Hardy model [11]; and strong contextuality. These three 
properties form a strict hierarchy; strong contextuality implies possibilistic non-locality, which 
implies probabilistic non-locality, but the converse implications fail. In fact, we show that 
the Bell model is probabilistically but not possibilistically non-local; the Hardy model is 
possibilistically non-local but not strongly contextual; and the GHZ models T^, for all 
numbers of parties greater than 2, are strongly contextual. Thus we have a hierarchy 

Bell < Hardy < GHZ. 

Moreover, Ray Lai has shown (private communication) that the only bipartite no-signalling 
devices satisfying strong contextuality are the PR boxes, thus giving a new characterization 
of these super-quantum devices. 

• We show that strong contextuality is equivalent to a quantitative notion of maximal 
contextuality, which has been studied in the special case of Bell-type scenarios as maximal 
non-locality. We use this equivalence to characterize maximal contextuality, and in 
particular maximal non-locality, in terms of a boolean satisfiability problem naturally 
associated with a probabilistic model, for the case of dichotomic measurements, and more 
generally in terms of a constraint satisfaction problem. 
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• We apply our linear algebraic methods for constructing global sections to the issue of giving 
local hidden- variable realizations using negative probabilities [131 [HI [151 [16] . We show 
that there is an equivalence between the existence of such realizations, and the no-signalling 
property. 

• We give a general perspective on Kochen-Specker type theorems as generic (model- 
independent) proofs of strong contextuality. We show the general combinatorial structure of 
these results, and make connections to graph theory, leading to a notion of Kochen-Specker 
graphs, defined in purely graph-theoretic terms. 

• We prove a general result (Theorem 18. ip which shows a strict equivalence between the 
realization of a system by a factorizable hidden-variable model, and the existence of a 
global section which glues together a certain compatible family on a presheaf. Factorizability 
is a general property, which subsumes both Bell-locality and a form of non-contextuality at 
the level of distributions as special cases. This means that the whole issue of non-locality and 
contextuality can be translated into a canonical mathematical form, in terms of obstructions 
to the existence of certain global sections. This opens up the possibility of applying the 
powerful methods of sheaf theory to studying the structure of these notions. 

• We show in detail how the abstract setting we use can be represented in quantum mechanics; 
hence our results apply to all the standard situations. One interesting point which emerges 
from this is that the property of compatibility of a family of sections on a presheaf 
corresponds to a form of no-signalling [17 . This form of no-signalling subsumes, but is 
more general than, the usual notion; it applies to arbitrary families of commuting observables, 
not just those represented on different factors of a tensor product. We therefore prove a 
generalized no-signalling theorem, showing that quantum mechanics does satisfy this 
more general property. 

The remainder of this paper is organized as follows. The basic setting is motivated and 
laid out in Section 2. The correspondence between global sections and (deterministic) local 
hidden variables is explained in Section 3. The linear algebraic method for constructing global 
sections (or determining their non-existence) is presented in Section 4, together with the results 
relating to the Bell and Hardy models. The equivalence between no-signalling and the existence 
of local hidden- variable realizations with negative probabilities is proved in Section 5. Strong 
contextuality, the results relating to the GHZ models and the hierarchy between Bell, Hardy and 
GHZ, and the connections with maximal non- locality, are presented in Section 6. The general 
combinatorial structure of Kochen-Specker-type theorems is studied in Section 7. In Section 8, 
we prove our general result relating factorizable hidden- variable models to the existence of global 
sections. Representations in quantum mechanics, and the generalized form of no-signalling, are 
treated in Section 9. Section 10 contains a postlude, summarizing what has been done, discussing 
related work, and describing some further directions. 

The mathematical background needed to read this paper is quite modest. In particular, 
only the bare definitions of category and functor are required. A brief appendix reviews these 
definitions. 

2. The Setting 

2.1. A Basic Scenario 

Our starting point is the idealized situation depicted in the following diagram. 
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a 

dQb 

c 

Alice 



Bob 



There are several agents or experimenters, who can each select one of several different 
measurements to perform, and observe one of several different outcomes. These agents may or may 
not be spatially separated. When a system is prepared in a certain fashion and measurements are 
selected, some corresponding outcomes will be observed. These individual occurrences or 'runs' of 
the system are the basic events. Repeated runs allow relative frequencies to be tabulated, which 
can be summarized by a probability distribution on events for each selection of measurements. 
We shall call such a family of probability distributions, one for each choice of measurements, an 
empirical model. 

As an example of such a model, consider the following table. 



A 


B 


(0,0) 


(1,0) 


(0,1) 


(1,1) 


a 


b 


1/2 








1/2 


a' 


b 


3/8 


1/8 


1/8 


3/8 


a 


b' 


3/8 


1/8 


1/8 


3/8 


a' 


b' 


1/8 


3/8 


3/8 


1/8 



The intended scenario here is that Alice can choose between measurements a and a', and Bob can 
choose b or b'. Thus the measurement contexts are 

{a,b}, {a',b}, {a,b'}, {a',b'}, 

and these index the rows of the table. Each measurement has possible outcomes or 1. The 
matrix cell at row (a', b) and column (0,1) corresponds to the event where Alice performs a' and 
observes the outcome 0, and Bob performs b and observes the outcome 1. This can be described 
by the function 

{a' 0, 1}. 

The cells of the row indexed by {a',b} correspond to the set of functions O*-^, where C is the 
measurement context {a', b}, and O = {0, 1} is the set of outcomes|i| 

Each row of the table specifies a probability distribution on events for a given choice of 
measurements, i.e. on the set O'-^ where the row is indexed by C. For example, the event 

{a' b^l} 

is specified to have the probability 1/8. 

The basic ingredients of our formalism will be the measurement contexts, the events, 
and the distributions on events. A model of a particular measurement scenario will be given by 
specifying a set of measurements X, a family Ai of measurement contexts, and for each context 
C e , a distribution on the events . 

We shall now proceed to formalize these ideas. Simple as this setting may seem, it does have 
significant mathematical structure, which our formalization will enable us to articulate. 

I denotes the set of functions from C to O. This and a few other set-theoretic notations are explained in the 
Appendix. 
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2.2. Events 

We shall fix a set X of measurements. We shall also fix a set O of possible outcomes for each 
measurement 1§| Throughout this paper, we shall assume that X and O are finite. 

For each set of measurements ?7 C X, a section over [/ is a function s : C/ — >■ O. Such 
a section describes the event in which the measurements in U were performed, and the outcome 
s(m) was observed for each m £ U . 

We shall write S : U ^ for the assignment of the set of sections over U to each set of 
measurements U . There is also a natural action by restriction. If C [/', there is a map 

res^' : £{U') £{U) :: s ^ s\U. 

Note that res^ = id[/, and if [/ C [/' C [/", then 

resj^ o reS[^, — res^j . 

Altogether, this says that £" is a presheaf, i.e. a functor £ : 'P{X)°^ — !• Set. 

£ has an important additional property. Suppose we are given a family of sets {Ui\i^i with 
Ui6/ Ui = U] i.e. the family {Ui} is a cover of U. Suppose moreover that we are given a family 
of sections {si G £{Ui)}i^i, which is compatible in the following sense: for all i,j E I, 

Si\u^ n Uj = Sj\u^ n Uj. 

Then there is a unique section s e £iU) such that s\Ui = Si for all i G /. This says that we can 
glue together local data which is compatible in the sense of agreeing on overlaps; moreover, this 
glued section is uniquely determined. 

This gluing property is known as the sheaf condition; it says that £" is a sheaf, which we 
shall refer to as the sheaf of events. 

The fact that this sheaf condition holds for £ is quite trivial, since we are simply looking at 
functions on a discrete space; we can always glue together partial functions which agree on their 
overlaps, just by taking the union of their graphs. 

2.3. Distributions 

To capture the idea that empirically we observe statistical rather than deterministic behaviour 
in microphysical systems, we shall consider distributions on events. It will be advantageous to 
allow some generality in the notion of distribution we shall consider, by taking the algebra of 
probabilistic 'weights' as a parameter. 

A commutative semiring is a structure (i?, +, 0, •, 1), where (i?, +,0) and (i?, -,1) are 
commutative monoids, and moreover multiplication distributes over addition: 

X ■ {y z) — X ■ y + X ■ z. 
There are three main examples of semirings which will be of interest: the reals 

(R,+,0,x,l), 
the non-negative reals 

(R>o,+,0, x,l), 
and the booleans 

B= ({0,1},V,0,A,1). 

We fix a semiring R. Given a set X, the support of a function : X — > i? is the set oi x € X 
such that (l){x) ^ 0. We write supp((/i) for the support of 4>. An /^-distribution on X is a function 
d : X ^ R which has finite support, and such that 

§ We could allow a different set of outcomes for each individual measurement, but we will not need this extra 
generality. 
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Note that the finite support condition ensures that this sum is well-defined. We write T>ji{X) for 
the set of i?-distributions on X. 

In the case of the semiring M>o, this is the set of probability distributions with finite support 
on X . In the case of the booleans B, it is the set of non-empty finite subsets of X\ thus possibilistic 
or relational models [TSl [T5] are also covered. In the case of the reals R, it is the set of signed 
measures with finite support, allowing for 'negative probabilities' jT51 [HI [T51 116j . 

Given a function / : X — > y, we define 

VR{J):VR{X)^VR{Y)::d^[v^ ^ d{x)]. 

f{x)=y 

This is easily seen to be functorial: 

'Dnig o /) - Vn{g) o Vn{f ), Pfl(idx) = idp„(x) 

so we have a functor Vfi : Set — > Setlj] 

We can compose this functor with the event sheaf £ : 7'(X)°p — > Set, to form a presheaf 
VjiE : 7'(X)°P — > Set, which assigns to each set of measurements U the set T)r{£{U)) of 
distributions on [/-sections. It is worth writing out the functorial action of this presheaf explicitly. 
Given U C U' we have a map 

VR£{U')^VR£iU)::d^d\U, 

where for each s e S{U): 

d\U{s) J2 ^(^')- 

s'ee{U'),s'\u=s 

Thus d\U is the marginal of the distribution d, which assigns to each section s in the smaller 
context U the sum of the weights of all sections s' in the larger context which restrict to s. 

2.4- Measurement Covers 

A crucial point is that it may not be possible, in general, to perform all measurements together. 
This is implicit in the idea that each agent makes a choice of measurement from several alternatives; 
only the measurements which are chosen are actually performed. In the situation where the agents 
are spatially separated, and the measurements which each performs are localized to their own site, 
the measurements at the different parts can be performed jointly. In general, we must allow 
for more complex situations, where compatible sets of measurements may overlap in complicated 
ways. 

We shall now introduce the notion of measurement cover, which formalizes the idea that only 
certain measurements can be performed jointly. 

A measurement cover A4 on the set X of measurements is a family of subsets of X such 

that: 

• is an anti-chain, i.e. C,C' & M and C C implies C = C". 

We think of X as a set of labels for the basic measurements in an experiment. A set C G yVJ 
is a measurement context; a set of measurements which can be performed jointly. We shall 
focus on the maximal compatible sets of measurements, hence the anti-chain condition. Any 
compatible family of measurements in X will be included in some element of Ai. 

It should be noted that measurement covers provide a very general way of representing 
compatibility relationships. Of course, a physical interpretation in particular circumstances will 
give rise to specific structures of this kind. We shall discuss quantum representations of the 
formalism in detail in Section [HI We also discuss the conceptual consequences of our (and related) 
results concerning compatibility in the Postlude in Section [TUl 

II This functor forms part of the well-known distribution monad; see e.g. |20l for references. 
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2.4-T Bell-Type Scenarios We shall now describe a particular class of measurement covers which 
arises in the formulation of Bell-type theorems on non-locality, and in the study of PR-boxes and 
other non-local devices. 

Consider a disjoint family {Xi}i^i. We think of / as labelling the parts of a system, which 
may be space-like separated; Xi is the set of basic measurements which can be performed at part 

1. We form the disjoint union X of this family. We define Ai to be those subsets of X containing 
exactly one measurement from each part. Thus we regard measurements performed in different 
parts of the system as compatible, but do not allow for compatible measurements in the same 
part. 

2. J^.2. Kochen-Specker-Type Scenarios Measurement covers are general enough to cover the 
situations arising in Kochen-Specker style proofs of contextuality, as well as the Bell-type scenarios 
for non-locality. 

Consider the set X = {mi, . . . , mis}, and the measurement cover A4 whose elements are the 
columns of the following table: 



mi 


mi 


ms 


ms 


m2 


mg 


mi6 


mi6 


mi7 


m,2 


m5 


mg 


mil 


m5 


mil 


mi7 


mis 


mis 


ma 


me 


ma 


my 


mi3 


mi4 


m4 


me 


mi3 


m4 


my 


mio 


mi2 


TOl4 


mi5 


mio 


TO12 


mi5 



The importance of this example is that it can be realized by unit vectors in R^, such that each 
measurement context C in is an orthogonal set of vectors. This structure is used in the 
18- vector proof of the Kochen-Specker theorem in [21; . 

We shall discuss proofs of the Kochen-Specker theorem in detail later in the paper; from the 
point of view of the abstract, 'logical' structure in Section [71 and as regards the interpretation in 
quantum mechanics in Section [9.21 

2.5. Empirical Models 

We shall now show how the intuitive scenario described at the beginning of this section can be 
captured formally, using the mathematical structure we have developed. 

Suppose we are given a measurement cover M. Recall that A4 covers X , i.e. [J M = X. 

We shall define a no-signalling empirical model for to be a compatible family for 
the cover Ai with respect to the presheaf "DrE. This means that for each measurement context 
C e A^, there is a distribution ec £ 'Dr£{C). Moreover, this family of distributions is compatible 
in the sense of the sheaf condition; for all C, C" G Ai, 

ec\C n C = ec'\C n C . 

In the case of Bell-type scenarios, this is readily seen to coincide with the usual notion of no- 
signalling. For example, in the bipartite case, consider contexts C — {ma,m{,}, C = {ma,m'f^}, 
with a choice of measurement each for Alice and Bob. Fix sq G £{{ma}), which assigns some 
outcome to m^. Then the compatibility condition implies that 

s(^£{C),s\rna—SQ s' ££{C') ,s' \ rna — so 

This says that the probability for Alice to get the outcome specified by sq for her measurement 
ma is the same, whether we marginalize over the possible outcomes for Bob when he makes the 
measurement mb, or the measurement mj,. In other words, Bob's choice of measurement cannot 
influence Alice's outcome. This is exactly the standard definition of no-signalling. 

We should also note, as a boundary case, that f (0) is a one-element set, and 'DfiE{0) is 
again a one-element set. Thus if contexts have empty intersection, the compatibility condition is 
trivially satisfied. 
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The general notion of compatible family for arbitrary covers applies to a much wider range 
of situations than Bell-type scenarios; later we will show that the empirical models which can be 
represented as quantum mechanical systems satisfy this more general form of no-signalling. 

We shall only consider no-signalling models in this paper, so henceforth we shall simply speak 
of empirical models. 

2.6. Examples 

We shall now show how some standard examples appear in our formalism. 

Consider a bipartite Bell- type scenario, where Alice has two possible measurements {a, a'}, 
and Bob has {&, 6'}. There are two possible outcomes, or 1, for each measurement. 

Thus there are four maximal measurement contexts: 

{a,6},{a',6},{a,&'},{a',6'} 
which index the rows of the following table: 





(0,0) 


(1,0) 


(0,1) 


(1,1) 


(a,&) 


Pi 


P2 


P3 


Pi 


(«',&) 


P5 


P6 


P7 


P8 


(«,&') 


P9 


PlO 


Pll 


Pl2 


(«',&') 


Pl3 


Pl4 


Pl5 


Pl6 



The rows of this table correspond to the sets of sections £{C), where C ranges over the maximal 
measurement contexts. Thus, for example, the cell labelled with p2 corresponds to the section 
{a^l,b^O} in £{C), where C = {a, b}. 

The table specifies a weight pi for each of these sections; in the standard case of probabilistic 
models, these will be non-negative reals, such that the values along each row sum to 1, and hence 
form a probability distribution. The distributions ec for each maximal context C collectively 
specify what we are calling an empirical model; and the no-signalling condition corresponds exactly 
to the compatibility condition on this family of distributions. 

As a specific example, consider the following table: 



A 


B 


(0,0) 


(1,0) 


(0,1) 


(1,1) 


a 


b 


1/2 








1/2 


a' 


b 


3/8 


1/8 


1/8 


3/8 


a 


b' 


3/8 


1/8 


1/8 


3/8 


a' 


b' 


1/8 


3/8 


3/8 


1/8 



We shall use this model later to give a proof of Bell's theorem [T] . 
As another example, consider: 





(0,0) 


(1,0) 


(0,1) 


(1,1) 


(a, 6) 


1/2 








1/2 


{a',b) 


1/2 








1/2 


{a,b') 


1/2 








1/2 


{a',b') 





1/2 


1/2 






This is a PR box [10]. 

We can also consider models over other semirings of weights. For example, the following is 
a specification of the possibilistic version of a non-local Hardy model [TT], with weights in the 
boolean semiring. It can be viewed as specifying the support of a standard probabilistic Hardy 
model. 
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(0,0) 


(1,0) 


(0,1) 


(1,1) 


{a,b) 


1 


1 


1 


1 


{a',b) 





1 


1 


1 


{a,b') 





1 


1 


1 


{a',b') 


1 


1 


1 






3. Global Sections 

We shall now show how the structures we have exposed in our mathematical description of 
empirical models can be brought to bear on the analysis of non-locality and contextuality. 

We have already observed that the presheaf of events £ is in fact a sheaf; it is natural to ask 
if the same holds for the presheaf VnS. Indeed, since empirical models are compatible families for 
this presheaf, to say that the sheaf condition holds for such a family {ec}ceM, with respect to a 
measurement cover Ai, is to say that there exists a global section d € 'Dji£{X), defined on the 
entire set of measurements X. Such a global section defines a distribution on the set £{X) = , 
which specifies assignments of outcomes to all measurements. Moreover, this distribution must 
restrict to yield the probabilities specified by the empirical model on all the measurement contexts 
in ^4: i.e. for all C G ^4, d\C — ec- Thus the existence of a global section for the empirical 
model corresponds exactly to the existence of a distribution defined on all measurements, which 
marginalizes to yield the empirically observed probabilities. This places the idea of extendability 
of probability distributions, as studied in pioneering work by Fine |22| . in a canonical and general 
mathematical form. 

We can say more than this. A global assignment s e £{X) = , i.e. a global section of 
the sheaf £, can be seen as a canonical form of deterministic hidden variable, which assigns 
a definite outcome to each measurement, independent of the measurement context in which it 
appears. This yields an assignment s\C for each measurement context C. A global section 
d G T>ii£{X) specifies a distribution on this canonical set of deterministic hidden variables. Each 
s € induces a distribution 5s & 'Dfi£{X), where Ss{s) = 1, and 6s{s') = if s 7^ s'. The 
distribution induced by s on each measurement context C is Ss\C; note that Ss\C = (5s|c- Now we 
have: 

ecis) = d\C{s) ^ J2 di-s') ^ ^s'\cis)-d{s') = Ss'\C{s) ■ dis'). 

s'e£{X),s'\c=s s'es{x) s'es{x) 

Thus the condition that d\C — ec for each measurement context C says exactly that we reproduce 
the empirically observed probabilities ec(s) by averaging over the hidden variables with respect 
to the distribution d. 

It is also easily verified that for each context C, and s' G £{C): 

Ss\c{s') = n '5^iw(^'iw)- 

That is, the probability distribution determined by s factors as a product of the probabilities 
assigned to the individual measurements, independent of the context in which they appear. We 
shall define a general version of this factorizability property later, in Section |S1 

If we specialize to the case of Bell-type scenarios, we see that factorizability corresponds to 
Bell locality [I . For example, in a context {a,b}, where a is a measurement for Alice, and b 
a measurement for Bob, then for a joint assignment of outcomes {a 1— > oi,6 1— > 02}, it says that 
the probability of this joint outcome determined by the hidden variable s is the product of the 
probabilities it determines for {a 1— ?> 01} and {b 1— >■ 02} . In other situations, it corresponds to a 
form of non-contextuality at the level of distributions. 

We can summarize this discussion as follows: 
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Proposition 3.1 The existence of a global section for an empirical model implies the existence 
of a local (or nan- contextual) deterministic hidden-variable model which realizes it^ 

We note also that, as we shall show later (see Theorem 18. ip . apparently more general forms 
of realization of empirical models by factorizable hidden-variable models, in which the hidden 
variables are not required to be deterministic, are in fact equivalent to canonical realizations 
by global sections. Thus the entire issue of non- locality and contextuality — i.e. the existence of 
empirical models which have no such realizations — is equivalently formulated as the non-existence 
of global sections for the corresponding compatible families. 

Thus we have a characterization of the phenomena of locality and non-contextuality in terms of 
obstructions to the existence of global sections, a central issue in the pervasive applications 
of sheaves in geometry, topology, analysis and number theory. This opens the door to the use of 
the powerful methods of sheaf theory in the study of non-locality and contextuality. 

4. Existence of Global Sections 

The discussion in the previous section motivates the following problem: 



We shall give a general linear-algebraic method for answering this question, which as we have 
seen is equivalent to the question of whether there exists a realization of the model by local or 
non-contextual hidden variables. 

^.1. The Incidence Matrix 

We are given a measurement cover M. The first (and main) step is to construct a matrix M of 
O's and I's, which we shall call the incidence matrix of Ai. This matrix is defined using only 
M and the event sheaf S, and can be applied to any empirical model expressed as a compatible 
family for A^, with respect to any distribution functor 'Db- 

To define the incidence matrix, we firstly form the disjoint union Wc^m^^^) 
sections over the contexts in A^, and then specify an enumeration si, . . . , Sp of this set. We also 
specify an enumeration ti, . . . ,tq of all the global assignments tj G , i.e. the global sections of 
the sheaf £. We then form the {p x g)-matrix M, with entries defined as follows: 



Given an empirical model, determine if it has a global section. 





otherwise. 



Conceptually, this matrix represents the tuple of restriction maps 



n £iC)::s^{s\C)ceM- 



To see this, note that for each C we have the embedding 

£{C) V{£{C)) :: s ^ {s}. 
Thus we obtain a map £{X) — > Y\ceM ^(^(^))- Now we use the isomorphism 



iei iei 

to obtain a map £{X) — > "PdJcex ^(^))- Such a map is the same thing as a relation 



R C Six) X Y[ ^{C)- 



K Note that 'deterministic hidden variable model' means that the model is deterministic for each fixed value of the 
hidden variable. 
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(0,0) 


(1,0) 


(0,1) 


(1,1) 


(a, 6) 




S2 


S3 


S4 


(a',6) 


35 


36 


37 


38 


(a, 6') 


sg 


3W 


311 


312 


{a',b') 


Sl3 


Si4 


Sl5 


Sl6 



Figure 1. Enumeration of sections 



The incidence matrix is the boolean matrix representation of this relation. Viewing it as a 
0/1-matrix over the semiring R, it acts by matrix multiplication on distributions in (X), 
represented as row vectors: 

dH- {d\C)ceM- 

Thus the image of this map will be the set of families {ec}ceM which arise from global sections. 



4-2. Example: Bell-Type Scenarios 

We shall illustrate this construction for Bell-type scenarios. Following standard terminology, we 
shall refer to a Bell- type scenario with n parts, each of which has k possible measurements, each 
with / possible outcomes, as of {n, k, /)-type. Note that for a system of (n, k, Z)-typc, there arc 
fc" measurement contexts, for each of which there are possible assignments of outcomes. Thus 
there are (fcZ)" sections over the contexts. The set of all measurements is of size kn, and there are 
^kn global assignments. Thus the incidence matrix in this case will be of size (fcZ)" x Z'^". Each 
row of the matrix will contain ^C^-i)" I's. 

We shall describe the (2,2,2) case explicitly. In this case, the matrix is 16 x 16. We shall 
use the cmimcration of sections over contexts given in the table in Figure 1. We shall also use an 
evident enumeration of global sections obtained by viewing them as binary strings, where the i'ih 
bit indicates the assignment of an outcome to the i'th measurement, listed as a, a', b, b'. 

The incidence matrix is then as follows. 

1111000000000000 
















00000000 



1 1 



000000000 





1111 



10101000000000 
01010100000000 

10 10 
10 10 1 



000000001 
000000000 
1 1 1 
00001100000 







00110000001 



110 




0000001100000011 



1 

1 














1 
1 
10 



1 


1 



1 1 

1 1 

1 1 

1 1 1 



The Sheaf- Theoretic Structure Of Non-Locality and Contextuality 



12 



This matrix has rank 9. We shall give a general formula for the rank of the incidence matrix 
in Proposition 15.61 and apply it to the (ti, 2, 2) cases in Section [5?3l 

We note that in the case of Bell-type scenarios, incidence matrices have been studied as 
'transfer matrices' in see also [24 LJ Our account generalizes this to arbitrary measurement 
covers, and also provides a clear conceptual derivation of these matrices in terms of the restriction 
maps. 

4.3. Global Sections as Solutions of Linear Systems 

Now we consider an empirical model {ec}, defined with respect to the distribution functor Vn. 
Such a model assigns a weight in the semiring R to each section Si £ £{C). Thus it can be specified 
by a vector V of length p, where V[i] = ec{si). We can also introduce a vector X of length q of 
'unknowns', one for each global section tj £ £{X). Now a solution for the linear system MX = V 
will be a vector of values in R, one for each tj. To ensure that this vector is a distribution, we 
augment M with an extra row, every entry in which is 1, and similarly augment V with an extra 
element, also set to 1. A solution for this augmented system will enforce the constraint 

X[l] + • • • + X[g] = 1 

and hence ensure that the assignment of weights defines a distribution on £{X). The remaining 
equations ensure that this distribution restricts to yield the weight specified by the empirical model 
for each section Sj . 

Proposition 4.1 Let M' be the augmented incidence matrix, and V the augmented vector 
corresponding to an empirical model e over the distribution functor . Solutions to this system 
of equations M'X = V in R correspond bijectively to global sections for e. 

We also note that in the case of Bell-type scenarios of {n, k, Z)-type, it is not necessary to use 
the augmented system; solutions of the equation MX ~ V will automatically be distributions. 
This follows easily from the regular structure of the incidence matrices for these cases. 

4.4- Examples 

We shall consider a number of examples, based on the models of (2, 2, 2)-type discussed in 
Section HH 

4.4-1- The Bell Model We look again at the Bell model 





(0,0) 


(1,0) 


(0,1) 


(1,1) 


(a,&) 


1/2 








1/2 


{a'.h) 


3/8 


1/8 


1/8 


3/8 


ia,b') 


3/8 


1/8 


1/8 


3/8 


{a',b') 


1/8 


3/8 


3/8 


1/8 



We are interested in finding a solution in the non- negative reals, i.e. a probability distribution 
on the global assignments £{X). This amounts to solving the linear system over the reals, subject 
to the constraint X > 0; i.e. to a linear programming problem. It is easy in this case to give a direct 
argument that there is no such solution, and hence that the above model has no hidden-variable 
realization, thus proving Bell's theorem 

Proposition 4.2 The Bell model has no global section. 



We thank one of the journal referees for bringing this connection to our attention. 
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Proof We focus on 4 out of the 16 equations, corresponding to rows 1, 6, 11 and 13 of the 
incidence matrix. We write Xi rather than X[i]. 





+ 


X2 


+ 




+ 


X4 


= 1/2 


X2 


+ 


Xi 


+ 


Xe 


+ 


Xs 


= 1/8 




+ 


Xi 


+ 


Xn 


+ 


X12 


= 1/8 


Xi 


+ 


X5 


+ 


Xg 


+ 




= 1/8 



Adding the last three equations yields 

Xi + ^2 + X3 + 2X4 + + + Xg + X9 + Xn + X12 + X13 = 3/8. 

Since all these; t(;rnis must be non-ncgativc, the left-hand side of this equation miist be greater 
than or equal to the left-hand side of the first equation, yielding the required contradiction. □ 

4-4-2- The Hardy Model Now we consider the possibilistic version of the Hardy model, specified 
by the following table. 





(OA)) 


(1.0) (0.1) 


(1-1) 


(a, 6) 


1 


1 1 


1 


{a',b) 





1 1 


1 


(a, 6') 





1 1 


1 


{a',b') 


1 


1 1 






This is obtained from a standard probabilistic Hardy model by replacing all positive entries by 1; 
thus it can be interpreted as the support of the probabilistic model. 

In this case, we are interested in the existence of a solution over the boolean semiring. This 
corresponds to a boolean satisfiability problem. For example, the equation specified by the first 
row of the incidence matrix corresponds to the clause 

Xi V X2 V X3 V X4 

while the fifth yields the formula 

^Xi A ^Xg A ^Xs A ^Xy. 

A solution is an assignment of boolean values to the variables which simultaneously satisfies all 
these formulas. Again, it is easy to see by a direct argument that no such assignment exists. 

Proposition 4.3 The possibilistic Hardy model has no global section over the booleans. 

Proof We focus on the four formulas corresponding to rows 1, 5, 9 and 16 of the incidence 
matrix: 



Xi 


V 


X2 


V 


^3 


V 


X4 


-Xi 


A 


^Xa 


A 




A 


"iXy 


-Xi 


A 


^X2 


A 


^Xg 


A 


"■Xio 


^X4 


A 


^Xs 


A 


~'Xi2 


A 


"■Xie 



Since every disjimct in the first formula appears as a negated conjunct in one of the other three 
formulas, there is no satisfying assignment. □ 
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To understand the significance of this result, we note the following general fact. 

Proposition 4.4 Let M he the incidence matrix for a cover Ai, and let V be the vector of non- 
negative reals corresponding to a probabilistic model over AA. LefVh be the boolean vector obtained 
by replacing each non-zero element of by 1. Lf the system MX = V has a solution over the 
non-negative reals, then the system MX = V;, has a solution over the booleans. 

Proof This follows simply from the fact that the map from the non-negative reals to the 
booleans which takes all non-zero elements to 1 is a semiring homomorphism. □ 

It follows that, if the support of a probabilistic model has no global section with respect to 
the boolean distribution functor Va , then the probabilistic model itself has no global section with 
respect to the probability distribution functor 2?r>(,. Thus the argument given above implies that 
the probabilistic Hardy model also has no global section, and hence is non-local. 

The converse to Proposition 14.41 is false. Indeed, the Bell model, which as we have seen has 
no probabilistic global section, does have a boolean global section for its support. This is easy to 
show directly, but also follows from the general results in , which show that Hardy models are 
complete for the (2,2,2)-type cases, and in particular that there must be at least three sections 
excluded from the support in order for non-locality to hold, while the Bell model has only two 
zero entries. 

In this sense, we can say that the Hardy model satisfies a stronger non-locality property 
than the Bell model. In general, we say that a probabilistic model is probabilistically non- 
extendable it it has no global section over I?r>o, and possibilistically non-extendable if its 
support has no global section over . We have seen that possibilistic non-extendability is strictly 
stronger than probabilistic non-extendability. 

5. Negative Probabilities 

We shall now consider the question of extendability over real-valued distributions I?r, i.e. signed 
probability measures. Formally, this simply amounts to solving the linear system over the 
reals, with no additional constraints. Conceptually, this allows the introduction of negative 
probabilities in the extended model. Of course, these marginalize to yield standard non-negative 
probabilities in the measurement contexts stipulated by the empirical model. Thus the usual 
relative-frequency interpretation of the actually observed statistics is maintained. 

The appearance of negative probabilities in quantum mechanics has a long history, which we 
shall sketch in the Postlude (Section 10). In this section, we shall prove that all empirical models 
are extendable with respect to signed probability measures. In fact, there is an equivalence between 
extendability under signed measures and no-signalling. Note that the class of no-signalling models 
is strictly larger than the empirical models of this type that arise in quantum mechanics. For 
example, it includes the superquantum Popescu-Rohrlich boxes |10| . 

The result therefore shows that negative probabilities, in themselves, cannot characterize 
quantum mechanics. This runs contrary to an impression which might be gained from the 
literature. For example, Feynman writes: "The only difference between a probabilistic classical 
world and the equations of the quantum world is that somehow or other it appears as if the 
probabilities would have to go negative ..." [IHl P-480]. In fact, the introduction of negative 
probabilities yields the entire no-signalling world. 

5.1. Solving the Linear System Over the Reals 

Given an empirical model e over I?r, our aim is to find a global section. The existence of such 
a global section for e, which is represented by the real vector V, reduces to the existence of a 
solution for the linear system MX = V over the reals, with no additional constraints. 

Note that there is no semiring homomorphism from the reals to the booleans. Indeed, if there 
were such a homomorphism h, we would have: 

= h{0) = h{l + (-1)) = h{l) V h{-l) = 1 V h{-l) = 1. 
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A similar argument shows that there is no homomorphism from the reals to the non-negative reals. 
Thus there is no result analogous to Proposition 14.41 and it is possible for the linear system to be 
solvable over the reals, while no solution exists over the non-negative reals or the booleans. 

We shall now show that such solutions exist for all no-signalling probabilistic empirical models, 
over arbitrary measurement covers. This substantially generalizes previous results, e.g. Theorem 1 
in [M]. 

We introduce some notation. We fix a standard set of outcomes O :— {1, ...,/}. Given a 
cover M., we define the set of partial contexts: 

U -.^ {U <Z X \ 3C M.U <Z C}. 

For each U €U and p > 0, we define 

E^P\U) := {seE{U)\\s~\{l})\<p}, 

the set of sections which map at most p measurements to the outcome 1 . 
Given a section s, we write Sm:=j for the section defined by: 

Sra:=]{'m) = j, Syn:=j{m') = s(m'), (w' ^ m). 

Finally, given an empirical model e, we define 

e(f) ■.= {euis) \U eU,se£'-P\U)}. 

Proposition 5.1 Let e be a probabilistic model over Ai. Then e is linearly determined by e^^\ 

Proof We shall prove that wc can infer e^^'*"^-' from e*^^-*; the fact that e is determined by e^^-* 
then follows by induction. 

Consider U &U, s £ E'^^\ and m e U. Let U' := U \ {m}. Using compatibility, 

eu'{s\U') = ^eu{s„i:=j). 

3 

Hence 

eu{Sm: = l) = eu'{s\U') - '^eu{Sm:=])- (1) 

All the terms on the RHS of this equation are in e*^^^; while every element of 6*^^+^-' can be written 
in the form of the LHS. 

Unwinding the induction, every number ec{s) is given by a linear combination of values in 
6^*^^ obtained by back-substitution in ([1]). □ 

We can consider probabilistic models as real vectors, as in the previous section. For a given 
cover A^, the dimension of the ambient vector space will be t :~ X]ceA4 

Proposition 5.2 The dimension of the subspace of M* spanned by the vectors arising from 
probabilistic empirical models is bounded above by D :— '^u^if{l — 1)''^'. 

Proof From Proposition 15. 11 we know that any probabilistic empirical model is determined by 
a vector of D numbers. Moreover, the corresponding map L : R* defined by the equations 

([T]) is linear. Hence the subspace spanned by the probabilistic empirical models is contained in 
the image of L, and has dimension < D. □ 
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This gives us an upper bound on the dimension of the linear subspace generated by the 'no- 
signaUing polytope' over the measurement cover Ai. We shall now give a lower bound on the 
dimension of the linear space spanned by the non-contextual models over M — i.e. those arising 
from global sections. 

Given a cover A^, we have the set U of partial contexts. Given U £U and s e we 
can define the global assignment vu,s '■ X ^ O: 

vu,sim) = s{m), (meU), vu,s{m) ^ I, [m^U). 
Note that vjj^s — vu',s' implies U — U' and s ~ s' . Each such assignment vu^s defines a column 
vector -vjj^s — M[_ , W(7,s]. There are clearly D = J2ueu(^ ^ -'^)'^' assignments. 

Proposition 5.3 The set of vectors {'Vu,s}ueu se£'°Hu) '■^ linearly independent. Thus the 
dimension of the linear .subspace o/R* spanned by the vectors arising from global sections is bounded 
below by D. 

Proof Suppose that J2u s l^u,s'vu.,s = 0. We shall show that ^u^s — for all U, s, by complete 
induction on |X \ C/|. 

Given some U',s', we choose a row of the incidence matrix (C, sq) such that U' C C and 
vu'.s'\C — So, so that so\U' = s'. Note that, for any U",s", M[(C, sq), "ycf.s"] = 1 if and only 
if vuir^s"\C = So, if and only if U" D C = U' , and s"\U' = s'. If vu",s" 7^ vu',s', we must then 
have U" D U'; so by induction hypothesis, fiu".s" = 0. Using the (C, Sq) component of the vector 
equation J2u s l^u.s^u,s — 0, we conclude that fJ,u',s' =0. □ 

We now come to our main result. 

Theorem 5.4 Let M be any measurement cover. The linear subspaces generated by the non- 
contextual and the no-signalling models over this cover coincide, with dimension D. 

Proof Since the non-contextual models are a subset of the no-signalling models, this follows 
immediately from the matching lower bound on the dimension of the local subspace from 
Proposition l5.3l and upper bound on the dimension of the no-signalling space from Proposition l5.2l 

□ 

As an immediate consequence of this result, we have: 

Theorem 5.5 Let M be any measurement cover, and e a probabilistic model over this cover, 
with corresponding vector V G M* . Then the linear system MX = V has a solution over the reals. 

We can also apply this result to the incidence matrix. Let M be the incidence matrix defined 
over a cover Ai and set of outcomes O. 

Proposition 5.6 The rank o/M, as a matrix over the reals, is D. 

Proof The incidence matrix defines a linear map from the vector space generated by the global 
assignments into R*. By Theorem 1 5. 4[ the dimension of the image of this map is 13. □ 

5.2. Example: The PR Box 
We consider the PR box: 





(0,0) 


(1,0) 


(0,1) 


(1,1) 


{a,b) 


1/2 








1/2 


{a',b) 


1/2 








1/2 


{a,b') 


1/2 








1/2 


{a',b') 





1/2 


1/2 






A simple solution of the linear system for the PR box is the vector 

[1/2, 0, 0, 0, -1/2, 0, 1/2, 0, -1/2, 1/2, 0, 0, 1/2, 0, 0, 0]. 

This vector can be taken as giving a local hidden- variable realization of the PR box using negative 
probabilities. Similar realizations can be given for the other PR boxes. 
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We now consider some symmetry properties of a cover A4, and the associated family of partial 
contexts U. We define W*^-'-* to be the set of elements ofU of cardinality j , < j < n. We say that 
Ai is homogeneous if the following conditions hold: 

(i) All the contexts C G M have the same number n of elements. 

(ii) Every set U G ZY^^' is a subset of the same number Nj of contexts C G Ai. Note that we 
must always have Nq = p, where p — \A4\. 

We consider some examples: 

• Every (n, k, /)-type Bell scenario is homogeneous, with p = fc", and Nj ~ k""^^ . 

• The measurement cover described in Section r2.4.2l corresponding to the Kochen-Specker proof 
from (21j, is homogeneous, with p = 9, n — 4, Ni — 2, and Nj = 1 for all 2 < j < 4. 

• Many of the constructions used in Kochen-Specker proofs are homogeneous, for example the 
cover corresponding to the Peres- Mermin magic square [271 IZ] ; which consists of the rows and 
columns of the table 



A 


B 


C 


D 


E 


F 


G 


H 


I 



In this case, p = 6, n = 3, A^i = 2, and A^2 = ^3 = 1- 
Proposition 5.7 Let M be a homogeneous cover. Then: 

("Ml - 



D 



N. 



Proof From the definitions, 

D = 



ueu j=o 
Homogeneity implies that ^ p(J^) /Nj. 

We now apply this result to our examples: 



□ 



For (n, fc, /)-type scenarios, we have 



D 



E 



k^{i ~iy. 



Applying the binomial theorem, we obtain D = {k - {l — l)-\- 1)". This retrieves the dimension 
established in with the minor difference that the value given there is D — 1. This apparent 
discrepancy arises because marginalization over the empty set is excluded in [24] . using the 
fact that, by normalization, 60(0) — 1. However, in this case the equations ([1]) are affine 
rather than linear. 

• For the 18-vector cover, taking Z = 2, we obtain D = 118. This can be compared with the 
dimension of the ambient vector space, which is 9 • 2'' = 144. 

• The corresponding value for the Peres-Mermin square is 13 = 34, with ambient dimension 48. 

We can also apply this formula to the rank of the incidence matrix. For example, for (n, 2, 2)- 
scenarios, where the incidence matrix is of size 4" x 4", the rank is 3". 
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This formula for the rank can be made visually apparent in this case, by noting that, with 
a suitable choice of enumeration for the rows and columns, the incidence matrices M(n) have a 
self-similar inductive structure: 



M(l) 



110 
11 
10 10 
10 1 



M(n + 1) 



M(n) M(n) 


M(n) 

M(n) 





M(n) M(n) 
M(n) 

M(n) 



5.4. Global Sections and No-Signalling 



No-signalling has been built into our notion of empirical model through the requirement of 
compatibility of the family {ec}- Note, though, that any family, whether compatible or not, 
gives rise to a linear system of equations MX = V. If this system has a solution, so that the 
family has a global section, it is automatically compatible, and hence satisfies no- signalling. 

Proposition 5.8 Let d G 'Dii£{X) be a global section. Then the family {d\C}c<aM is compatible. 
Proof This follows immediately from the functoriality of restriction. For any C, C" e A^: 

and thus (d|C)|Cn C" = d\Cr\C'. Similarly, (d|C")|C n C" = d|C n C". Hence d|C and d|C" 
agree on their overlap. □ 

Combining this result with Theorem 15.51 we obtain the following Theorem. 

Theorem 5.9 Probability models have local hidden-variable realizations with negative probabili- 
ties if and only if they satisfy no- signalling. 

Thus we have a striking equivalence between no-signalling models, and those admitting local 
hidden-variable realizations with negative probabilities. 



6. Strong Contextuality 

Consider a probability model over a cover M. By Proposition 231 if the model is extendable over 
2?R>(,, its support is extendable over the booleans. This means that there is a boolean distribution 
d on £{X) which restricts to supp(ec) for every context C E M.. Such a distribution is simply 
a non-empty subset S of £{X). The condition that d\C = supp(ec) means that, for all s G 5, 
s\C G supp(ec) for every C G M.; and moreover, every section in supp(ec) is of the form s\C for 
some s in 5*. 

Given an empirical model e, we define the set 

Se := {sg5(X) :VCgX.s|Cg supp(ec)}. 

Thus a consequence of the extendability of e is that Se is non-empty. 

We say that the model e is strongly contextual if this set Se is empty. Whereas a global 
section for an empirical model e completely determines its behaviour, asking for some assignment 
s G £{X) which is consistent with the support of e is much weaker. The negative property 
that not even one such assignment exists is correspondingly much stronger. Indeed, the Hardy 
model, which as we saw in the previous section is possibilistically non-extendable, is not strongly 
contextual. The global assignment 

{flh^ 1, a' i~> 0, 61-^ 1, b' ^ 0} 

is in for this model. The Bell model similarly fails to be strongly contextual. 

The question now arises: are there models coming from quantum mechanics which are strongly 
contextual in this sense? 
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We shall now show that the well-known GHZ models 12 , of type (n, 2, 2) for all n > 2, are 
strongly contextual. This will establish a strict hierarchy 

Bell < Hardy < GHZ 

of increasing strengths of obstructions to non-contextual behaviour for these salient models. 

The GHZ model of type (n, 2, 2) can be specified as follows. We label the two measurements 
at each part as X''^ and and the outcomes as and 1. For each context C, every s in the 
support of the model satisfies the following conditions: 

• If the number of Y measurements in C is a multiple of 4, the number of I's in the outcomes 
specified by s is even. 

• If the number of Y measurements is 4fc + 2, the number of I's in the outcomes is odd. 
We will see later how a model with these properties can be realized in quantum mechanics. 
Proposition 6.1 The GHZ models are strongly contextual, for all n > 3. 

Proof We consider the case where n = 4fc, fc > 1. Assume for a contradiction that we have a 
global section s G Se for the GHZ model e. 

If we take Y measurements at every part, the number of 1 outcomes under the assignment 
is even. Replacing any two F's by X's changes the residue class mod 4 of the number of F's, 
and hence must result in the opposite parity for the number of 1 outcomes under the assignment. 
Thus for any Y^J'' assigned the same value, if we substitute X's in those positions they 

must receive different values under s. Similarly, for any , y assigned different values, the 
corresponding X'^^\ X^^^ must receive the same value. 

Suppose firstly that not all F'^*-' are assigned the same value by s. Then for some i, j, k, F*^*' 
is assigned the same value as Y'^^\ and Y'^^^ is assigned a different value to Y'^^\ Thus F*^*) is 
also assigned a different value to Y'^^\ Then X'^^^ is assigned the same value as X'^^\ and X'^^^ is 
assigned the same value as X'^^\ By transitivity, X'*^ is assigned the same value as X'^^\ yielding 
a contradiction. 

The remaining cases are where all F's receive the same value. Then any pair of X's must 
receive different values. But taking any 3 X's, this yields a contradiction, since there are only two 
values, so some pair must receive the same value. 

The case when n = 4fc -I- 2, fc > 1, is proved in the same fashion, interchanging the parities. 
When n > 5 is odd, we start with a context containing one X, and again proceed similarly. 

The most familiar case, for n = 3, does not admit this argument, which relies on having at 
least 4 y's in the initial configuration. However, for this case one can easily adapt the well-known 
argument of Mermin using 'instruction sets' ,28 to prove strong contextuality. This uses a case 
analysis to show that there are 8 possible global sections satisfying the parity constraint on the 
3 measurement combinations with 2 K's and 1 X] and all of these violate the constraint for the 
XXX measurement. □ 

We shall also mention an elegant result due to Ray Lai (private communication). 

Proposition 6.2 (Lai) The only strongly contextual no-signalling models of type (2, 2, 2) are the 
PR boxes. 

Thus strong contextuality actually characterizes the PR boxes. 
6.1. Strong Contextuality and Maximal Non-Locality 

The property of strong contextuality is defined in a simple 'qualitative' fashion, in terms of the 
support of a model. As we shall now see, for probabilistic models it is equivalent to a notion 
which can be defined in quantitative terms, and has been studied in this form in the special case 
of Bell-type scenarios^ 

* We thank one of the journal referees for pointing out this connection. 
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Suppose that {ecjceM is a model over the presheaf 2?m>(j. We consider convex 
decompositions 

e = AL+(l-A)g, < A < 1, (2) 

where L is a local model, and q a no-signalling model. This means that, for every C € A^, and 
s e £{C), we have: 

ecis)^XLcis) + {l-X)qcis). 

We define the non-contextual fraction of e to be the supremum over all A appearing in 
such convex decompositions ([2]). This notion was introduced for Bell- type scenarios in [29]; see 
also 1301 E]; where the terminology local fraction is used. A model with local fraction is 
defined to be mELximally non-local. Geometrically, this corresponds to the model being on a 
face of the no-signalling polytope with no local elements. 

In the general setting of models defined on arbitrary measurement covers, we say that a model 
e is maximally contextual if the non-contextual fraction of e is 0. 

Proposition 6.3 A model e is strongly contextual if and only if it it is maximally contextual. 
Proof Suppose firstly that e admits a convex decomposition ([2]). By the results of Section [21 
and also Theorem 18. 11 we can take L to be a convex sum of deterministic models J2i l^i^si, where 
each Si € £{X) is a global assignment. If A > 0, then from Si G Se for each i. Thus strong 
contextuality implies maximal contextuality. 

For the converse, suppose that s £ Se- Taking L = A • ^s, we shall define q such that ^ holds. 
For each C e M and s' £ £{C): 

ec{s')-X-S,\cis') 
<Zc(.) . 

It is easily verified that, for each C, X]s'e£(c) 1c{-^') = 1- To ensure that q is always non-negative, 
we must have A < iuiceM ec(s|C). Since this is the infimum of a finite set of positive numbers, 
we can find A > satisfying this condition. 

It remains to verify that q is no-signalling, i.e. that {qc} forms a compatible family. Given 
C, C" G M, fix So e £(C n C"). Now 

qc\CnC'{so) = l/{l~X)[{ ec(s')) - A.5,|cnc'(so)]. 

s'e£{c),s'\cnC'=so 

A similar analysis applies to qc'\C Ci C"(so). Using the compatibility of e, we conclude that 

qclCnC ^qc'lCnC. □ 

We can use this equivalence to give a characterization of maximal contextuality, and in 
particular of maximal non-locality, in terms of a constraint satisfaction problem. In the 
case of dichotomic measurements, this reduces to a boolean satisfiability problem. 

We recall that a constraint satisfaction problem (CSP) is specified by a triple {V, K, TZ), 

where ^ is a finite set of variables, K is a, finite set of values, and 7?. is a finite set of constraints. A 
constraint is a pair (C, S), where C C V, and S C K"-^ . (It is more common to define a constraint 
as an ordered list of k variables, and a set of fc-tuples of values, but this is obviously equivalent to 
the version given.) An assignment s : V ^ K satisfies a constraint {C,S) if s\C G S. A solution 
of the CSP {V, K, TV) is an assignment s : V ^ K which satisfies every constraint in TZ. 

Let e be a model over a cover Ai, with outcome set O. For each C e A^, we have the set 
Se{C) supp(ec) C . We can associate the CSP (X, O, {5e(C) | C € A^}) with e. 

Proposition 6.4 A probabilistic model e is maximally contextual if and only if the corresponding 
CSP has no solution. In particular, for Bell-type scenarios, e is maximally non-local if and only 
if the corresponding CSP has no solution. 

Proof This follows directly from Proposition 16.31 since global sections in the support of e are 
clearly in bijective correspondence with solutions of the associated CSP. □ 
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In the case of dichotomic measurements, the CSP reduces to a boolean satisfiabiUty problem. 
In this case, we interpret the two possible outcomes as truth values, and X as a set of prepositional 
variables. 

Given a model e, we define the formula 

•^e A V 

ceM ses.SC) 

where for a section s G O*^ we define the corresponding formula 

IpS ■— /\ TO A 

m(^C.s(m)— true mGC,s(m)— false 

Proposition 6.5 A probabilistic model e with dichotomic measurements is maximally contextual 
if and only if the corresponding formula 0e is unsatisfiable. In particular, for Bell-type scenarios, 
e is maximally non-local if and only 0e is unsatisfiable. 

7. Generic Strong Contextuality and Kochen-Specker Theorems 

Let e and e' be models, such that the support of e is included in the support of e'. Then Se is 
included in Se'', hence if e' is strongly contextual, so is e. Thus by showing strong contextuality 
for a single model, we can show it for a whole class of models. 

We shall fix our set of outcomes as {0, 1}. This means that we can define subsets of £{C) 
by formulas (j)c, with the elements of C used as propositional variables. A section s : C — >■ {0, 1} 
can be viewed as a boolean assignment for these variables, and 0c defines the set of its satisfying 
assignments. 

We are interested in particular in the formula 

ONE(C) Y (to A /\ -to'). 

mSC m'eC\{m} 

This is satisfied by those assignments with exactly one outcome set to 1. 
A Kochen-Specker-type result [2^ can be factored into two parts: 

(i) Defining covers Ai such that there is no global section s G £{X) which satisfies the formula 

<I>M A ONE(C). 
CeM 

(ii) Providing quantum representations for these covers, which interpret the measurements by 
quantum observables in such a way that every quantum model for this set of observables has 
its support included in ONE(C) for each C E A4, and hence is strongly contextual. 

We shall explain the quantum aspects in a later section. Here we shall investigate the 
combinatorial structures involved in the first part. 

We shall give a simple combinatorial condition on the cover M which is implied by the 
existence of a global section s satisfying (pM ■ Violation of this condition therefore suffices to prove 
that no such global section exists. 

For each m € X, we define 

M{m) ■.:^{C eM\meC}. 

Proposition 7.1 If 4>m has a global section, then every common divisor o/{|A^(to)| | to e X} 
must divide \AA\. 

Proof Suppose there is a global section s : X ^ {0, 1} satisfying (pM- Consider the set X' Q X 
of those TO such that s(to) = 1. Exactly one element of X' must occur in every C € M.. Hence 
there is a partition of into the subsets A^(to) indexed by the elements of X' . Thus 

\M\= J2 I^MI- 

meX' 

It follows that, if there is a common divisor of the numbers |7M(m)|, it must divide □ 
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For example, if every m G X appears in an even number of elements of M , while M has an 
odd number of elements, then (j>_M has no global section. This corresponds to the 'parity proofs' 
which are often used in verifying Kochen-Specker-type results |21] [34] . 

The simplest example of this situation is the 'triangle', i.e. the measurement cover with 
elements 

{a,b},{b, c},{a,c}. 

This example has also been discussed, in a somewhat different context, in |35| . 

An example where X has 18 elements, and there are 9 maximal compatible sets, each with 
four elements, such that each element of X is in two of these, appears in the 18-vector proof of 
the Kochen-Specker Theorem in [2j. 

7.1. Kochen-Specker Graphs 

The measurement covers which can be represented by quantum systems are of a particular form: 
they are generated by a symmetric binary compatibility relation, since compatibility in 
quantum mechanics means that the observables pairwise commute. Thus, for example, the 
'triangle' cannot arise from quantum observables. 

This suggests that we should take account of this feature. It turns out that this leads us 
directly to some standard notions in graph theory. 

An undirected graph G is specified by a finite set of vertices Vg, and a set of edges Eg, which 
are two-element subsets of Vg ■ A clique of G is a set G CVq with an edge between every pair of 
vertices in C. The set of maximal cliques of G forms a measurement cover A4g- 

Let G be a graph. A set 5 C Vg is called a stable transversal 36 if for every maximal 
clique G of G {i.e. for every G £ Mg), 15* n G| = 1. Note that it is necessarily the case that a 
stable transversal is independent, i.e. there is no edge between any pair of elements of S, since 
otherwise we could extend this pair to a maximal clique containing both. 

Proposition 7.2 Let G be a graph. The formula (pM defined on Mg has a global section if and 
only if G has a stable transversal. 

Proof Suppose (j)M has a global section s. Then T := {m G X \ s{m) = 1} is a stable traversal 
of G. 

Conversely, suppose that T is a stable transversal of G. If we define s as the characteristic 
function of T on AT, then s |= ONE(G) for each maximal clique G of G, and so 4)m has a global 
section. □ 

In order to apply graph-theoretic results to the quantum situation, we need to know which 
graphs can arise from families of quantum observables. For reasons which will be explained when 
we discuss quantum representations in Section [21 we are interested in graphs which can be labelled 
by vectors in R'^, such that two vertices are adjacent if and only if the corresponding vectors are 
orthogonal. It turns out that in graph theory, the complementary notion is used [37], so we shall 
say that such graphs have a faithful orthogonal co-representation in W^. We must also require 
that the maximal cliques all have size d. 

Thus we define Kochen-Specker graphs to be finite graphs G such that: 

• G has a faithful orthogonal co- representation in W^. 

• The maximal cliques of G all have the same size d. 

• G has no stable transversal. 

Any such graph generates a measurement cover Mg such that the formula 4)Mg has no 
global section; and every such graph can be realized by quantum observables, as will be shown in 
Section [Hi Thus, these graphs provide explicit finite witnesses for generic strong contextuality. An 
example is provided by the orthogonality graph for defined by the set of 18 vectors given in 
[H], as well as the various sets of 31 or more vectors which have been found in R'^ [2] [35] [3S1 HO]- 

A final desideratum is to provide a purely graph-theoretic condition for the existence of a 
faithful orthogonal co-representation. In [ST] [41] the following result is proved. 
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Theorem 7.3 Every graph on n nodes whose complementary graph is (n — d)-connected has a 
faithful orthogonal co-representation in . 

We note that a general graph-theoretic approach to contextuahty, on somewhat different hnes 
to ours, has been developed in [5]. Interesting connections are shown between contextuality and 
Lovasz's -i^-function 

In [8], 'compatibility structures' are studied as sets of events rather than measurements. This 
leads into the formalism of convex operational theories as 'generalized probability theories'. By 
contrast, we distinguish between measurements, outcomes and events. This allows the functorial 
presheaf structure of probabilistic models to be articulated. Moreover, we use standard probability 
theory, as encapsulated in the distributions functor I?b>o- The non-classical, contextual features 
of models arise from their functorial variation over contexts. At the same time, this mathematical 
structure directly reflects the basic operational scenario described at the beginning of Section 2. 

8. Global Sections and Hidden Variables 

We shall now consider a general notion of hidden-variable model, and show that an empirical 
model is realized by a factorizable hidden-variable model if and only if it has a global section. 

We are given a measurement cover M. We fix a set A of values for a hidden variable. A 
hidden- variable model h over A assigns, for each A € A and C G A^, a distribution G 'Dji£{C). 
It also assigns a distribution Ha € ^^{A) on the hidden variables. Note that this distribution is 
independent of the context; this is the standard structural assumption of A-independence [15] . 
We require that for each A G A, the family {h^}c£M is compatible, i.e. for all C, C" E Ai: 

h^lcnC = h^„\cnc'. 

Just as compatibility for empirical models corresponds to no-signalling, compatibility for hidden- 
variable models corresponds to the parameter independence condition [44l |45] . 

We say that a hidden-variable model h realizes an empirical model e if the probabilities 
specified by e are recovered by averaging over the values of the hidden variable. Formally, this 
says that for all C e and s e f (C): 

ec{s) - ^/i^(s)./jA(A). 

A6A 

The intended purpose of hidden-variable models is to explain the non-intuitive behaviour of 
empirical models, in particular those arising from quantum mechanics, by showing that it can 
be reproduced by a model whose behaviour is more intuitive, at the cost of introducing hidden 
variables. In particular, one would like to explain the non-local and contextual behaviour predicted 
by quantum mechanics in this way. The general property which a hidden-variable model should 
satisfy in order to provide such an explanation is factorizability, which subsumes both Bell 
locality [1^, and a form of non-contextuality at the level of distributions. It is defined as follows. 

We say that a hidden- variable model h is factorizable if, for every C & M, and s G £{C): 

hhi^) = n hh\{m}{s\{m}). 

This says that the probability assigned to a joint outcome factors as the product of the probabilities 
assigned to the individual measurements. Note in particular that, if m G C n C", then the 
compatibility condition on h implies that h^\{nn] = /i^, |{m}. Thus the probability distributions 
on outcomes of individual measurements are independent of the contexts in which they occur. 

For Bell- type scenarios, factorizability corresponds exactly to Bell locality [1 . More generally, 
it asserts non-contextuality at the level of distributions. 

Our main result can now be stated as follows. 

Theorem 8.1 Let e he an empirical model defined on a measurement cover A4 for a distribution 
functor Dr. The following are equivalent. 

(i) e has a realization by a factorizable hidden-variable model. 
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(a) e has a global section. 

Proof Proposition 13. II shows that (ii) imphes (i). It remains to prove the converse. 

Suppose that e is rcahzcd by a factorizable hidden-variable model h. For each m £ X, we 
define := h^\{m} e I?_R,£({m}) for any C ^ M such that m £ C. By the compatibility of 
the family {h^}, this definition is independent of the choice of C. Also, we shall write s\m rather 
than s|{r7i}. We define a distribution h^ G T)ji£{X) for each A G A, by: 

We must show that this is a distribution. We enumerate the set of measurements X as 
X — {mi, . . . , TTip}. A global assignment s G £{X) can be specified by a tuple (oi,...,Op), 
where Oi = s{mi). Now we can calculate: 

Ese£(x)n,„ex^m(s|'^) 

= Eol /^ii (™1 ^ Ol) • (Eo. ^n. (™2 02) • ( 
= Eoi ^ni ("^1 ^ 01) • (Eo, /J^, (m2 02) • ( 

= Eoi/lmi("^1^0i)-l = l. 

We now show that for each context C in A^, h^\C = h^. We choose an enumeration of X such 
that C — {mi, . . . , mq}, q < p. 

h'x\C{s) = Es'eSiXls'\C=sh^xi^') 

= E(oi,...,Op),s=(oi,...,o,) riLl /im.K ^ 

= nLi ^K)) • (Eo,,„...,o, <K ^ «^)) 

Now we define a distribution d E 'Dfj£{X) by averaging over the hidden variables: 
d{s) :- 5]/i^(s)./ia(A). 

AgA 

We verify that this is a distribution: 

E.G£(X)^(s) = EAGAE.Ge(X)'^x(s) • /iA(A) 

= Eaga/^a(A)-(E.g£(x)/^x(.'^)) 

= EAGA/iA(A)-l = l. 

It remains to show that d restricts at each context C to yield ec- 
d\C{s) = Es'e£ix),s'\c=sd{s') 

= T,s'e£{x),s'\c=s Eaga 'ix(s') • iT-AiX) 
= Eaga/^a(A)-/i^|C(s) 

= ec{s). 

Thus d is a global section for e. □ 



•••(Eo, /ii,("^p^->Op))---)) 

...(1)...)) 
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This result provides a definitive justification for equating the phenomena of non-locality and 
contextuality with obstructions to the existence of global sections. 

9. Quantum Representations 

Since our aim is to investigate general properties of systems and physical theories, it has 
been important that our entire discussion has been conducted without presupposing quantum 
mechanics, Hilbert spaces, etc. The mathematical structures which we have used have arisen in a 
rather transparent fashion from the basic experimental scenario with which we began. 

However, it is important to make explicit how the structures we have described can be 
represented in quantum mechanics. 

We begin with measurement covers. A quantum representation of a measurement cover on 
a set X can be described as follows. We fix a Hilbert space T-L. As usual, an observable is a 
bounded self-adjoint operator A onT-L. Two observables A, B are compatible if they commute: 
AB = BA. In this case, the composite AB is again self-adjoint, and hence forms an observable. 

Given a set X = {Ax\x£X of observables on % indexed by AT, we form a measurement cover by 
taking M to be the set of all maximal commuting subsets of X . Note that pairwise commutation 
implies that the observables in each such subset, composed in any order, form a well-defined 
observable. We say that an abstract measurement cover M has a quantum representation if 
it arises in this way. 

For Bell-type scenarios, a quantum representation will have a particular form, refiecting the 
usual idea that the measurements are performed at a number of different sites, which may be 
space-like separated. We will have a family {T-Li\ of Hilbert spaces, one for each part. The 
elements of Xi will index a family Xi of incompatible (i.e. non-commuting) observables on "Hi. We 
make these into local observables on the compound system % = Hi ^ ■ ■ ■ ^ Hk, by defining 
A* := I (E) ■ ■ ■ (E) A (E) ■ ■ ■ <E) I ioi each A ^ Xi. Then commutes with B^ whenever i j, and we 
can form a measurement cover of Bell type on the compound system. 

It is interesting in this connection to discuss a result due to Tsirelson [35] . This result can be 
stated, following [47], as follows: 

Theorem 9.1 Let {Xi} and {Yj} be finite, commuting sets of positive operators on a Hilbert 
space H, generating finite- dimensional von Neumann sub-algebras of B('H). Then there exist 
finite- dimensional Hilbert spaces Hi and H2 such that {Xi} can be mapped isomorphically into 
the subalgebra of operators onHi® H2 of the form A® I , and {Yj } can be mapped isomorphically 
into the subalgebra of operators of the form I ® B. 

The import of this result is that, under the stated assumptions, tensor product structure can 
be retrieved automatically as a special case of the general situation of commuting operators on a 
single space. Thus the special form of representation for Bell-type scenarios is not really necessary, 
although it is the one which is standardly used. 

Now we turn to events. For simplicity, we shall confine ourselves to the finite-dimensional 
case. Recall that a self-adjoint operator A has a spectral decomposition 

where ai is the i'th eigenvalue, and is the projector onto the corresponding eigenspace. 
Measuring a quantum state p with this observable will result in one of the observable outcomes 
a,;, with probability Tr(/9Pi), while the state will be projected into the corresponding eigenspace. 

For simplicity of notation, we shall focus on dichotomic quantum observables, i.e. self- 
adjoint operators on a Hilbert space H with a spectral resolution into two orthogonal subspaces. 
In this case, we can use a standard two-element set O = {0, 1} to label these outcomes, and the 
sheaf £ to record the collective outcomes of a compatible set of observables. 

Thus for each basic measurement label m in A, we have an observable Am with spectral 
decomposition Am = a^P^ -|- a^P^, where PjJ^ -f P^ = /. Given a maximal set of commuting 
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observables C — {Am^, . . . , Am^}, for each s e we have a projector = Pm" • • • Pml"* • 
The composed observable Ac = Ami ' ' ' ^m/e has a decomposition of the form 

Ac^ "^P'*' 

where as = Yii am™'''- To ensure that these eigenvalues can be associated with distinct outcomes, 
we need that as = at implies that s = t. This can be achieved by appropriate choices of the 
eigenvalues a^. 

It may well be the case that this decomposition contains redundant terms, in the sense that 
Ps = for some values of s. The important point is that these projectors do yield a resolution of 
the identity: 

^ P, = iP"m^+PlJ---iPl,^+PlJ = = /. 

Now we consider empirical models. Suppose we are given an empirical model e on a 
measurement cover A4, which has a quantum representation in the form described above, based on 
a Hilbert space H. A quantum representation of e is given by a state p onH. For each compatible 
set of observables C ^ A4, the state defines a probability distribution pc on £{C), by the standard 
'statistical algorithm' of quantum mechanics: pc{s) = Tr(/3Ps). Thus pc G I?k>o^(C') for each 
CeM. 

An interesting point now arises: do the distributions {pc} necessarily form a compatible 
family? In the case of a Bell-type scenario, the fact that they do is the content of the standard 
no-signalling theorem [T7]. However, Bell-type scenarios are very special cases of measurement 
covers. We shall therefore verify explicitly that the distributions determined by a quantum state, 
with respect to any family of sets of commuting observables, do form a compatible family in the 
sense of sheaf theory. We can regard this as a generalized form of no-signalling theorem. 

Proposition 9.2 (Generalized No- Signalling) The family of distributions {pc} on families 
of commuting observables defined by a quantum state p are compatible on overlaps: for all C , C : 

pc\CC^C' = pc'\G^C'. 

Proof Firstly, we define Co := C n C", Ci := C \ Cq, and C2 := C" \ Co- Thus C is the disjoint 
union of C\ and Cq, and C" is the disjoint union of Ci and Cq. Note that f (C) = f (Co) x £(Ci), 
and f (C) = S{Cq) x £{C-2). Thus we can write s G S{C') as s = (sq, si), and similarly for sections 
in £{C'). In this notation, P(s(,,si) = PsqPsi- Now we can calculate: 

Pc\Co{so) = Y.siee{Ct)Pc{so,si) 

= Esiee(Ci)Tr(pP(so,.i)) 
= EsiG£(Ci)Tr(pPsoP.i) 

= Tr(X;sie£(Ci)PP«oP^i) 
= Tr(pP,„^^^g£(P^)P,J 
= Tr(pP,„/) 
= Tr(pP,J 

A similar computation shows that PC' IC'o(so) = PCo(so)- Hence | C n C = pc'\C {^C . □ 
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Thus we see that quantum mechanics obeys a general form of no-signaUing, which apphes to 
compatible families of observables in general, not just those represented as operating on different 
factors of a tensor product. This form of no-signalling says that, at the level of distributions ^ the 
statistics obtained for a measurement on a given state are independent of the context of other 
compatible measurements which may also have been performed. 

We can in fact use Tsirelson's Theorem 19.11 to reduce this result to the standard form of 
no-signalling theoremQ Given sets of commuting observables C and C", every operator in C n C" 
commutes with every operator in the symmetric difference CVC , so Theorem 19 . 1 1 applies . and we 
can represent these two sets of observables as acting on different factors of a tensor product. The 
standard version of no-signalling can now be used to show that the marginals on the first factor 
are independent of the choice of measurement on the second. 

9.1. GHZ Models 

We shall briefly review how GHZ models, which were used in Section [6j can be represented in 
quantum mechanics. For n > 2, we take the Hilbert space to be the tensor product of n qubit 
spaces. The local observables in each factor are the X and Y spin measurements, represented in 
the Z basis by eigenvectors for spin Right or Left along the x-axis, with basis vectors 

It) + li) It) - li) 

V2 ' V2 

and similarly for spin Forward or Back along the y-axis, with basis vectors 

It) + ^\\■) It) - ^\\■) 

V2 ' ^/2 ■ 

We shall label the outcomes as for spin Right for X and spin Forward for Y; and 1 for spin Left 
and spin Back respectively. 

The model is then generated by the GHZ state [121 US] , written in the Z basis as 

It ■ ■ ■ t) + i; ■ ■ ■ t) 

V2 \ 

If we measure each particle with a choice oi X ov Y observable, the probability for each outcome 
is given by the square modulus of the inner product 

I (GHZ I 6i---fe„)|2, 

where hi is the basis vector corresponding to the given outcome in the z'th component. 

This computation is controlled by the product of the |4,)-coefficients of the basis vectors, and 
hence by the cyclic group of order 4 generated by i . 

The following table gives the coefhcients of the \\) components labelled by measurement and 
outcome: 








1 


X 


+1 


-1 


Y 


+i 


—i 



The probability table for this model can be specified as follows: 

• Each row with an odd number of Y measurements has full support. 

• Each row in which the number of Y measurements is a multiple of 4 has as support those 
entries with an even number of outcomes labelled 1. 

• Each row in which the number of Y measurements is 4fc -I- 2 has as support those entries with 
an odd number of outcomes labelled 1. 

• In each case, the distribution is uniform on the support. 

Note that by 'row' here we mean a row of the probability table, i.e. the distribution on the set of 
sections over a given measurement context. 

Thus the interesting structure of this model arises purely from the support. 

{t We thank one of the journal referees for this observation. 
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9.2. Kochen- Speaker Representations 

We shall now discuss how the abstract discussion of Kochen-Specker situations in Section [7] can 
be represented in terms of quantum mechanics. 

We shall consider a particular form of dichotomic observables. Given unit vectors ei, . . . ,6^ 
representing distinct rays in a Hilbcrt space "H, we write 

Then we can take X ~ {^ei, ■ • ■ , ^e^} ^ set of measurements. Note that A^,- commutes with 
Ae- if and only if is orthogonal to e^. Also, the composition of a set of commuting observables 
{Ae.}iei will have a spectral decomposition of the form 

If we measure any state with this observable, the outcome must be that we get exactly one of the 
branches Pe;, with eigenvalue i; or that we get 'none of the above', corresponding to the branch 
P{ei|ie/}-i- J with eigenvalue 0. Moreover, if the cardinality of / equals the dimension of the Hilbert 
space, then the latter case cannot apply. 

If we now consider how outcomes are represented in the sheaf £, we see that we indeed have 
an a priori condition on those sections s which can be in the support of a distribution coming 
from a quantum state, as desired. Namely, using Xi as a label for Ag. , and taking s{xi) = 1 for the 
outcome corresponding to Pe^ for this observable, we see that the only sections which are possible 
are those which assign 1 to at most one measurement. Moreover, for those sets of compatible 
observables whose cardinality equals the dimension of the space — which must necessarily be 
maximal, and hence will appear in the measurement cover — exactly one measurement must be 
assigned 1. 

Thus if we take a set of unit vectors indexed by X , such that each vector is contained in at 
least one orthonormal basis indexed by a subset of X, the measurement cover Ai represented by 
the observables A^ will have the following key property: for any quantum state p, the support of 
the corresponding empirical model will satisfy the formula ONE(C) for each context C in A4. So 
the problem of exhibiting a state-independent form of strong contextuality has been reduced to 
the problem of finding a Kochen-Specker graph, as described in Section [71 

9.3. Bell- Type Scenarios and Kochen-Specker Theorems 

The measurement covers arising from Bell-type scenarios are a rather special class, which can be 
characterized as follows. 

Proposition 9.3 A measurement cover Ai arises from a Bell-type scenario if and only if it is 
the family of maximal cliques of a graph G = (X, Eq) which is the complement of an equivalence 
relation R on X : 

EG^{{x,y}\^ixRy)}. 

Proof Equivalence relations are in bijective correspondence with partitions X = Yii-^i- The 
maximal cliques of G are exactly the transversals of this partition, i. e. the sets T C X such that 
T intersects with each Xi in exactly one element. □ 

Note in particular that in Bell-type scenarios, incompatibility is transitive. This is by no 
means the general case. In terms of operators, A and B may commute, while G may fail to 
commute with either. 

The more complex configurations typical of Kochen-Specker constructions can never arise 
from Bell-type scenarios. 

Proposition 9.4 Consider a measurement cover M of Bell type, and any quantum 
representation of M. For any s G £{G) with G ^ M, there is a quantum state p such that s 
is in the support of pc. 
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Proof Given s, we define the local state pi :— |i/'i)(V'i| for each z, where "ipi is the eigenvector 
corresponding to the outcome specified by s for the measurement at i. Then the model defined 
by the state p ■— pi ® ■ ■ ■ ® Pn has s in its support. □ 
Hence there is no Kochen-Specker-type theorem for Bell-type scenarios. While, as we have 
seen, there are model-specific strong contextuality results, there are no generic results. The 
measurement covers arising from these scenarios are simply not rich enough in their combinatorial 
structure of overlapping intersections to support a result of this form. 

10. Postlude 

Our treatment of non-locality and contextuality makes a number of points: 

• Firstly, it is carried out at a high level of generality, and without any presupposition 
of quantum mechanics. None of the characteristic mathematical structures of quantum 
mechanics, such as complex numbers, Hilbert spaces, operator algebras, or projection lattices, 
are needed to expose the key structural issues. This characteristic is shared to some extent 
by other foundational approaches, such as generalized probabilistic theories [49], but these 
formalisms are still rather closer to that of quantum mechanics, and indeed have been directly 
suggested by it. Structures such as sheaves and presheaves varying over contexts can be seen 
as basic elements of a general 'logic of contextuality', and related structures have been used 
for a wide range of purposes, e.g. in the semantics of computation [53 jSH [S5]. This opens 
up the possibility of making some interesting connections between the study of non-locality 
and contextuality in physics, and ideas arising in other fields. 

• The sheaf-theoretic language, which directly captures the idea of structures varying over 
contexts, is a canonical setting for studying contextuality. Moreover, as we have seen, the 
gluing conditions and the existence of global sections captures the essential content of non- 
locality and contextuality in a canonical mathematical form. 

This opens the door to the use of the powerful methods of sheaf theory, which plays a major 
role in modern mathematics, in analyzing the structure of non-locality and contextuality. 
These notions are still poorly understood in multipartite and higher-dimensional settings. 
In [53| . the first author, with Shane Mansfield and Rui Scares Barbosa, define an abelian 
presheaf based on the support of an empirical model. The Cech cohomology of this presheaf 
with respect to the measurement cover is used to define a cohomological obstruction to locality 
or non-contcxtuality, as a certain cohomology class. It is shown for a number of salient 
examples, including PR boxes, GHZ states, the Peres-Mermin square, and the 18-vector 
configuration from j21| giving a proof of the Kochen-Specker theorem in four dimensions, 
that this obstruction does not vanish, thus yielding cohomological witnesses for contextuality. 
While these results are preliminary, they suggest that the use of cohomological methods in 
the study of non-locality and contextuality has some promise. 

• The canonical form of description of the key concepts in terms of the existence of global 
sections largely replaces any explicit mention of hidden variables. These appear only in 
Section [5J in the context of a foundational result showing the equivalence of local hidden- 
variable realizations to the existence of global sections. On the other hand, empirical models, 
which can be seen as directly related to observation and experiment, play a prominent role 
throughout the paper. 

There is also an interesting conceptual point to be made in relation to incompatibility of 
measurements. Usually, this is taken to be a postulate of quantum mechanics, and specific to the 
quantum-mechanical formalism of non-commuting observables. However, in the light of general 
results such as those obtained in this paper, in a line of work going back to that of Fine [22], a 
different view emerges. The incompatibility of certain measurements can be interpreted as the 
impossibility — in the sense of mathematically provable non-existence — of joint distributions 
on all measurements which marginalize to yield the observed empirical distributions. Thus, if we 
refer to the experimental scenario with which we began Section 2, this shows that there cannot 
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be, even in principle, any such scenario in which all measurements can be performed jointly, which 
is consistent with the actually observed outcomes. 

Thus the incompatibility of certain measurements is revealed as a theory-independent 
structural impossibility result for certain families of empirical distributions. These families include 
those predicted by quantum mechanics, and confirmed by experiment; but the result itself is 
completely independent of quantum mechanics. Thus in this sense, we can say that incompatibility 
is derived rather than assumed. 

10.1. Related Work 

The present paper builds on our previous work, in particular '181 by the first author, and [541155) by 
the second author (with H. Jerome Keisler and Noson Yanofsky, respectively). A natural direction 
for generalization of the results in the present paper would be from the finite setting considered 
here to the measure-theoretic one studied in [54l ; note that the distribution functor can be defined 
over general measure spaces i56i. 

Since we use sheaf theory as our mathematical setting, there is an obvious point of comparison 
with the topos approach, as developed by Isham, Buttcrficld, Doring, Heunen, Landsman, Spitters 
et ah [S71i58,^. 

The general point that presheaves varying over a poset of contexts provides a natural 
mathematical setting for studying contextuality phenomena is certainly a common feature. It 
should also be mentioned that presheaves have been used for similar purposes in the context of 
the semantics of computation, e.g. in the Reynolds-Oles functor-category semantics for programs 
with state [301 IHI] , and in the presheaf models for concurrency of Cattani and Winskel [52] • 

A more specific source of inspiration is the important insight in [57] , which initiated the whole 
topos approach, that the Kochen-Specker theorem could be reformulated very elegantly in presheaf 
terms, as stating the non-existence of global sections of a certain presheaf. 

On the other hand, there are several differences between the present work and the topos 
approach. For example, the topos approach focusses on a specific structure, the spectral presheaf, 
based on an operator algebra. In this sense, it uses concepts specific to quantum mechanics 
from the very start. Moreover, many of the key structures introduced in our work, such as the 
distribution functor and measurement covers, do not appear in the topos approach. One of our 
central objectives is to give a unified account of contextuality and non-locality, but locality issues 
have not been considered in the topos approach; nor has extendability, another key topic for us. 
It will, of course, be interesting to see if additional commonalities develop in future work. 

The appearance of negative probabilities in quantum mechanics has a long history. The 
Wigner quasi-probability distribution [13], further developed by Moyal [15], is a phase-space 
representation of quantum mechanics using negative probabilities. Feynman views such negative 
probabilities as a calculational convenience [16] . He explains that the appearance of a negative 
probability for a certain outcome does not invalidate the theory being used. Rather, this tells us 
that the relevant conditions cannot be realized, or that the outcome cannot be verified, or both. 
More specifically related to what we do, Sudarshan and Rothman [60. show that a local hidden- 
variable analysis of the Bell model is possible, if certain values of the hidden variable arise with 
negative probability. Finally, in Dirac [14] , negative probabilities enter in the relativistic extension 
of quantum mechanics. 
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Appendix 

Firstly, we recall some set-theoretic notations. 

We write |5| for the cardinality of a set 5. If / : X ^ F is a function and X' C X , we write 
f\X' : X' ^ Y for the restriction of / to X'. We write Y-^ for the set of functions from X to Y, 
and P{X) for the powerset of X. 

A family of sets {Xj}jg/ is disjoint if Xi (1 Xj = whenever i ^ j. We write Ujgj Xi for 
the union of a disjoint family. Given a disjoint family {Xi}i^i, there is an isomorphism 

r(\JXi)^ l[V{Xi) ■.:S^{SnXi)i^i. 

iel i£l 

A category has a collection of objects A, B,C, . . ., and arrows f,g,h, Each arrow has 

specified domain and codomain objects: notation is / : A ^ S for an arrow / with domain 
A and codomain B. Given arrows f : A B and g : B ^ C, wc can form the composition 
g o f : A ^ C. Composition is associative, and there are identity arrows id^ : A ^ A iov each 
object A, with / o id^ = /, id^ ° 9 = g, for every f : A ^ B and g : C ^ A. 

Our main examples of categories will be Set, with sets as objects and functions as arrows; 
and partially ordered sets {P, <), where there is a single arrow from p to g if p < g, and none 
otherwise. The opposite category P°p is the category formed from the opposite poset (-P, >)• 

If C and V arc categories, a functor F : C — > V assigns an object FA of V to each object 
A oi C; and an arrow Ff : FA — )■ FB of V to every arrow f : A ^ B of C. These assignments 
must preserve composition and identities: F{g o /) = F{g) o F{f), and F{\dA) = tdpA- 

A presheaf on a poset P is a functor P°p — > Set. 
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